Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data

نویسندگان

  • J. S. Yu
  • S. Ongarello
  • R. Fiedler
  • X. W. Chen
  • Gianna Toffolo
  • Claudio Cobelli
  • Zlatko Trajanoski
چکیده

MOTIVATION High-throughput and high-resolution mass spectrometry instruments are increasingly used for disease classification and therapeutic guidance. However, the analysis of immense amount of data poses considerable challenges. We have therefore developed a novel method for dimensionality reduction and tested on a published ovarian high-resolution SELDI-TOF dataset. RESULTS We have developed a four-step strategy for data preprocessing based on: (1) binning, (2) Kolmogorov-Smirnov test, (3) restriction of coefficient of variation and (4) wavelet analysis. Subsequently, support vector machines were used for classification. The developed method achieves an average sensitivity of 97.38% (sd = 0.0125) and an average specificity of 93.30% (sd = 0.0174) in 1000 independent k-fold cross-validations, where k = 2, ..., 10. AVAILABILITY The software is available for academic and non-commercial institutions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimensionality Reduction in Genomics and Proteomics

Finding reliable, meaningful patterns in data with high numbers of attributes can be extremely difficult. Feature selection helps us to decide what attributes or combination of attributes are most important for finding these patterns. In this chapter, we study feature selection methods for building classification models from high-throughput genomic (microarray) and proteomic (mass spectrometry)...

متن کامل

Feature Extraction for Classification of Proteomic Mass Spectra: A Comparative Study

To satisfy the ever growing need for effective screening and diagnostic tests, medical practitioners have turned their attention to high resolution, high throughput methods. One approach is to use mass spectrometry based methods for disease diagnosis. Effective diagnosis is achieved by classifying the mass spectra as belonging to healthy or diseased individuals. Unfortunately, the high resoluti...

متن کامل

Feature extraction and dimensionality reduction for mass spectrometry data

Mass spectrometry is being used to generate protein profiles from human serum, and proteomic data obtained from mass spectrometry have attracted great interest for the detection of early stage cancer. However, high dimensional mass spectrometry data cause considerable challenges. In this paper we propose a feature extraction algorithm based on wavelet analysis for high dimensional mass spectrom...

متن کامل

Novel Approaches to Visualization and Data Mining Reveals Diagnostic Information in the Low Amplitude Region of Serum Mass Spectra from Ovarian Cancer Patients

The ability to identify patterns of diagnostic signatures in proteomic data generated by high throughput mass spectrometry (MS) based serum analysis has recently generated much excitement and interest from the scientific community. These data sets can be very large, with high-resolution MS instrumentation producing 1-2 million data points per sample. Approaches to analyze mass spectral data usi...

متن کامل

A novel, high-throughput workflow for discovery and identification of serum carrier protein-bound peptide biomarker candidates in ovarian cancer samples.

BACKGROUND Most cases of ovarian cancer are detected at later stages when the 5-year survival is approximately 15%, but 5-year survival approaches 90% when the cancer is detected early (stage I). To use mass spectrometry (MS) of serum proteins for early detection, a seamless workflow is needed that provides an opportunity for rapid profiling along with direct identification of the underpinning ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 10  شماره 

صفحات  -

تاریخ انتشار 2005